Guinea baboons are strategic cooperators

Humans are strategic cooperators; we make decisions on the basis of costs and benefits to maintain high levels of cooperation, and this is thought to have played a key role in human evolution. In comparison, monkeys and apes might lack the cognitive capacities necessary to develop flexible forms of cooperation. We show that Guinea baboons (Papio papio) can use direct reciprocity and partner choice to develop and maintain high levels of cooperation in a prosocial choice task. Our findings demonstrate that monkeys have the cognitive capacities to adjust their level of cooperation strategically using a combination of partner choice and partner control strategies. Such capacities were likely present in our common ancestor and would have provided the foundations for the evolution of typically human forms of cooperation.

ALDMs (44,45) are fully automatic operant conditioning test systems that can be used for testing nonhuman primates in social settings without the need to capture or isolate them.They use an automatic radio frequency identification device (RFID) implanted in each forearm of the monkeys to gather information about the location, identity, and current task of specific individuals.S-ALDMs are a modified version of the original ALDM developed by J.F. S-ALDMs are pairs of ALDMs that can allow a reciprocal visual access between two individuals and their screen (46).This is achieved using a transparent partition when visual access is required, or an opaque partition when not.In ALDMs tasks, a correct choice results in the delivery of a reward (grains of wheat), accompanied with a black screen.
An incorrect choice does not result in the delivery of a reward and a 3s time out green screen is displayed.Importantly, when two participants participate in a S-ALDM task with a transparent partition they can see the respective outcome of their trials because they can see the black screen, hear the delivery of the reward, and see they partner eating when they are successful, and they can see the green screen when they are not.
The operating principles of the S-ALDMs are as follows.When one monkey is identified a blue screen appears for a maximum of four seconds.During the delay, if another individual is identified in the neighboring S-ALDM, a "dual task" is launched for both (Tab.S1).If there is no identification of a partner in the neighboring S-ALDM during the four seconds delay, a "filler task" starts (Fig. S1).
Figure S1: Filler task used in the experiments.The filler task we used was an adaptation of the Fitts task (48) that is used to measure the time required to go from one point to another.Baboons had to touch a square appearing successively on the right and left of the screen.At the start of a trial, a square first appeared on one side of the screen, when touched, it disappeared, and re-appeared on the opFirposite side of the screen.This was repeated twice so that monkeys had to touch the square four times in alternating position.If successful, baboons were rewarded but if a square was missed, the trial stopped and was considered a failure and a green screen was displayed for 3 secs, no reward was provided.The function of the filler task was to let the baboons use the S-ALDM in the absence of another individual.The data from the filler task were not analyzed.

Data analysis
Due to technical errors 0.34% of trials happened with a misidentification of the partner (noted "IdError" in the data).We performed all analyses reported below with and without these trials and found no substantial difference.Since the misidentification happened for the partner and we still know the choice of the focal individual we decided to report the results based on the full set of trials to maintain the balanced number of trials between conditions.All the data and analysis code are available to reproduce the analyses and figures below.See https://osf.io/dmujs/Experiment 1 : test condition During the dual task, a fixation cross appeared on each screen.The roles of the two individuals were then randomly chosen by the test program for each trial.The individual selected as the actor had to choose among three images randomly predefined as the prosocial choice, the selfish choice, and the control choice.The positions of the images were randomized for each trial to the top, middle, or bottom positions.The other individual, the receiver, was in a waiting position with a black screen.As soon as the actor made their choice, the outcome of the trial was determined first for the receiver, then, 1500ms later, for the actor.This delay was introduced to give the actor time to see the consequences of their choice on the receiver, and to avoid focusing exclusively on their outcome (reward or time out).Prosocial and selfish trials are illustrated in Figure S2.When the actor selected the control stimulus, the partner screen displayed a green screen, followed by a green screen for the actor and none of them received food rewards.The test condition started with a baseline phase in which the stimuli were presented and followed by a reverse phase during which the valence of the prosocial and selfish stimuli was reversed.Our prediction was that if monkeys are prosocial, they should choose the prosocial stimuli in both the baseline and reverse condition.As in previous experiments, we define a threshold of 80% prosocial choice in a set of 50 trials as the criterion to determine a change (e.g.46,47).

Experiment 1 : Ghost control condition
During the ghost control condition, we closed one of the two access to the S-ALDM, so that only one monkey could use the S-ALDM, with no partner present.When a trial started, the individual was automatically selected as an actor, and the 'receiver' was simulated by the computer.The trial continued as in the test phase: the fixation cross appeared, followed by three different images with the same outcomes (prosocial, selfish, and control).The ghost control condition also started with a baseline phase, followed by the reverse condition.The main objective of this phase was to establish the probability that monkeys would choose the prosocial option in both baseline and reverse condition in the absence of a partner.This could have happened for the following reasons:  The green screen appearing on the adjacent screen during a selfish choice could be a negative clue for the actor and be avoided, since they have been accustomed to the association of the green screen with a negative outcome (49).
 The sound of grain falling into the adjacent feeder might present a positive reinforcement, since monkeys are used to hearing the sound when they are rewarded.
 Finally, other unknown factors could have influenced the selection of the different stimuli.
Note that during the ghost phase, the 'ghost' S-ALDM delivered rewards but to avoid an accumulation of rewards in the 'ghost' S-ALDM we redirected the rewards into an opaque container.
In contrast to the experimental phase, our prediction was that monkeys would not change their response between the baseline and reverse conditions and therefore not choose the prosocial response in both conditions.

Results
Figure S3 and S4 show the results of the test and control condition for the baseline and reverse phase for all individuals.During the control condition, no baboons adopted a prosocial response in both the baseline and reverse phase, whereas eight did during the test phase (a significant difference with the control condition: binomial test, 0/18 vs. 8/18, p <0.001).In the test condition, however, the number of trials is larger than the control condition.When we limit the analysis to the same number of complete blocks of 50 trials done by the same individual in the reverse phase of the test and control condition, 5 individuals passed our 80% criterion before reaching the number of blocks done in the reverse control condition, still showing a strong significant difference (binomial test, 0/18 vs. 5/18, p <0.001).Baseline phase (red) and reversal phase (blue).Baseline phase (red) and reversal phase (blue).

Experiment 2 : Training
During experiment 2, we wanted to challenge their capacity to maintain cooperation by gradually introducing non rewarded PCT trials (NR-PCT) with one stimulus rewarding only the receiver (0-1), and the other giving no rewards (0-0;see main text).However, before starting Experiment 2, we wanted to make sure that the monkeys would not choose the prosocial stimulus by default at the start of the experiment.To remain conservative, we therefore decided to train them to choose the stimuli that would correspond to the selfish condition rather than the one used for the prosocial response.In this training phase, the two stimuli were presented in a forced choice task.Baboons were rewarded if they chose the stimuli that would later become the selfish one and not rewarded if they chose the stimuli that later became the prosocial one.

Experiment 2: Testing
During testing we progressively introduced NR-PCT trials to the rewarded PCT trials (R-PCT) to give monkeys time to adapt to the new non-reinforced trials.Every two days, the proportion of NR-PCT trials increased, from 0% to 100% (see main text).Crucially, at the end of testing, only NR-pct trials remained and actors could therefore no longer receive rewards directly.

Results : Training
For the training phase, the eighteen participating individuals all reached 80% of success in a block of 50 trials (mean number of blocks to reach the criterion: 1.2, min = 1, max =2).

Results : Testing
We analyzed separately blocks of 50 NR-PCT trials and of 50 R-PCT trials for the 8 prosocial monkeys revealed during experiment 1 (Fig S5).We found that all reached our criterion of 80% prosocial choice on a block of 50 trials at least once in each condition.

Analysis of behavioral strategies : Reciprocity
We examined the probability that a monkey chose the prosocial stimuli depending on their partner's previous response.To do that we selected all cases in which partners were the same but the roles were exchanged between two successive trials performed in less than 15 secs apart.For experiment 1, we selected trials that were done after the baboons had reached 80% prosocial choice and given the high level of prosocial choice, we found no evidence of reciprocity, the eight prosocial individuals   dependent variable the binary choice of the actor (Selfish = 0, Prosocial = 1), depending on the choice of the partner in the previous trial (Selfish or Prosocial) and including a random intercept for the actor, accounting for repeated measures.GLMM analysis was performed using R lme4 package (50).
Regarding experiment 2, we found that for NR-PCT trials, all 8 individuals were more likely to choose the prosocial response after their partner had done the same compared to when the partner had   Table S3: Analysis of reciprocity for experiment 2. GLMM results of the binomial model including as dependent variable the binary choice of the actor (Selfish = 0, Prosocial = 1), depending on the choice of the partner in the previous trial (Selfish or Prosocial) and including a random intercept for the actor, accounting for repeated measures.GLMM analysis was performed using R lme4 package (50).

Analysis of behavioral strategies : partner choice
In addition, we found that in both experiments prosocial monkeys were more likely than non-prosocial monkeys to change partner when their partner chose the selfish stimuli in the previous trial (Fig. S8).
This form of partner choice existed in both prosocial and non-prosocial monkeys but was significantly stronger in the former (Tab S4).Table S4: Results of the analysis of partner change in experiment 1 and 2. GLMM results of the binomial model including as dependent variable whether or not the trial previous trial was with the same partners (Same = 0, Change = 1), depending on the choice of the partner in the previous trial (Selfish or Prosocial) and with an interaction depending on whether the actor is part of the group of prosocial monkeys or not.The model also included a random intercept for the actor, accounting for repeated measures.GLMM analysis was performed using R lme4 package (50).
This partner choice created a situation in which there was a positive correlation between the number of trials performed by a pair of individuals and their joint level of prosociality (Fig S9 ).C: Group correlation for every pair (prosocial and non-prosocial) with at least 30 trials during experiment 1 (R-PCT).D: Group correlation for every pair (prosocial and non-prosocial) with at least 30 trials during experiment 2 (NR-PCT).
We found a reliable positive relationship for prosocial monkeys (Fig. S9 A & B).Since the data are non-independent, we used a non-parametric Spearman test that showed a positive relationship between the number of trials and the proportion of prosocial choice for pairs of individuals (Fig. S9 C & D; Experiment 1, R-PCT trials, Spearman rho = 0.31, p<0.001;Experiment 2, NR-PCT trials, Spearman rho = 0.32, p<0.001).

Analysis of behavioral strategies: Interrupted trial strategy
During the experiment, if a response was not obtained within a certain duration, the trial was considered as 'interrupted' and terminated (without delivery of a reward or presence of a time out).
There were two moments during which a trial could be interrupted (Fig S10 ): -when there was no response during the fixation cross, possible for both the actor and the receiver -when there was no response during the choice screen, possible for the actor only We noticed that the rate of interrupted trials almost doubled between the two experiments for prosocial monkeys (7 % [min = 5%, max = 9%] of interrupted trial in experiment 1 and 13% [min = 5%, max = 22%] in experiment 2).We compared the proportion of interrupted trials between the fixation cross and the choice screen, for prosocial and non-prosocial individuals (Fig. S11).This is close to 50% for non-prosocial individuals and slightly larger for prosocial individuals during experiment 1 (Tab.S5).We find similar results during experiment 2, except that the proportion of interrupted trials during the choice screen increased sharply for prosocial monkeys when they were in the role of actor (from a mean = 0.62 [SE = 0.04] to mean = 0.82 [SE = 0.05]).Table S5: Results of the interrupted trials analysis.GLMM results of the binomial model including as dependent variable whether or not the trial was interrupted during the choice screen (No = 0, Yes = 1), depending on the role of the individual (Receiver or Actor) and with an interaction depending on whether the actor is part of the group of prosocial monkeys or not.The model also included a random intercept for the actor, accounting for repeated measures.GLMM analysis was performed using R lme4 package (50).
This increase in interrupted trials specific to prosocial individuals and to the choice screen shows that prosocial individuals changed their strategy.Given that monkeys changed partners more often after an interrupted trial (Fig. S12 and Tab S6) and were more likely to abort a trial when the partner had previously chosen a selfish response (Fig. S13 and Tab S7), this shows another form of reciprocity and partner choice: when their partner did not make the prosocial choice, prosocial monkeys were more likely to not respond when their turn came (i.e.interrupted) and change partner.S6: Analysis of the probability to change partner after a completed or interrupted trial.GLMM results of the binomial model including as dependent variable the binary partner change (No change = 0, Change = 1), depending on the previous trial interrupted status (Completed or Interrupted) and including a random intercept for the actor, accounting for repeated measures.GLMM analysis was performed using R lme4 package (50).S7: Analysis of the probability to interrupt a trial depending on the partner previous behaviour.GLMM results of the binomial model including as dependent variable the binary interrupted trial variable (Completed = 0, Interrupted = 1), depending on the choice of the partner in the previous trial (Selfish or Prosocial) and including a random intercept for the actor, accounting for repeated measures.GLMM analysis was performed using R lme4 package (50).
During experiment 3, we wanted to explore the possibility that baboons would be willing to pay a small cost to maintain cooperation.Our first aim was to verify that they would differentiate a more costly choice to a less costly one, and that they would choose the former preferentially.We started a nonsocial task (Fig. S16) in which baboons were separated by an opaque partition from their partner.
When a trial started, they could choose between three different stimuli.The control stimulus delivered no reward and a 3s time out green screen was displayed.The non-costly stimulus delivered a reward.
The costly stimulus had to be touched twice with a variable delay between the two touches to deliver a reward (the position of the stimulus randomly changed between the two touches).We manipulated the delay of the costly stimulus, starting with a 5000ms delay, so a monkey had to wait 5000ms between the first touch and the second touch, whereas choosing the non-costly choice would be immediately rewarding.We then progressively decreased the delay to determine when they would not avoid the costly stimuli.The following delay were used in that order: 5000ms, 4000ms, 3000ms, 2000ms, 1000ms, 500ms, 50ms.To obtain a reliable estimate, for each delay we used six different pair of stimuli.
Figure S14: Trials in the preliminary experiment.Non-costly trial: when selected, the stimulus directly delivered the reward.Costly trial: when selected, a delay (between 5000 and 50ms) would be initiated before the same stimulus appeared in a different position.When selected a second time, it triggered the delivery of a reward.Control stimulus: when selected, it resulted in a 3s green screen time out.
Baboons progressed through the experiment when they reached 80% success on one block of 50 trials or after 10 blocks without reaching the criterion.The results (Tab.S8) show that all prosocial monkeys chose the less costly stimuli on all 6 pairs between 4secs and 1 sec.The costly-PCT task was similar to Experiment 1, with the exception that the prosocial stimuli had to be touched twice (in different positions) with a delay of one or three seconds (Fig. S17).We first performed a condition with a 1sec delay, then a 3 sec delay, then a 3sec delay ghost condition (similar to the experiment 1 ghost condition but with the costly prosocial choice).We repeated each condition twice with different stimuli and for 10 blocks of 50 trials for each prosocial monkey (except ATMOSPHERE who did not participate reliably in experiments at that time).

Experiment 3 : Results
We present the results of the 8 previously prosocial individuals, except for ATMOSPHERE, who did not participate reliably in experiments during this period.
During the 1000ms phase, 5/7 monkeys chose the prosocial stimuli above 80% in at least one block of 50 trials for the two sets of stimuli, 2/7 monkeys reached criterion in only one of the two sets.In the 3000ms phase, 4/7 monkeys reached criterion in the two sets, 2/7 in one and 1/7 in none of the two.
Compared to experiment 1, this suggests that although prosocial monkeys kept a high rate of prosocial choice, some preferred to choose the non-costly stimulus at some point.
By comparison, in the 3000ms ghost phase, prosocial monkeys reached an average proportion of prosocial choice of 38% (s.e.: 9 %, min = 3%, max = 75%) and 1/7 reached criterion twice, 4/7 reached it once, and 2/7 never reached criterion.These results show that prosocial monkeys were more likely to choose the prosocial option in the 1s and 3s delay condition, compared to the 3s delay ghost control condition (Tab S9).However, the fact that some monkeys persisted in choosing the prosocial stimuli in the ghost condition despite the cost suggest that the delay represented a small cost.Nonetheless, our results demonstrate that monkeys are willing to pay a small cost (a short delay) to sustain cooperation, a cost they are not ready to pay when there is no partner present.We included a random intercept and slope depending on the number of repetitions for the actor and a random intercept and slope depending on the number of blocks for the actor to account for repeated measures (changes in the random structure did not affects the results qualitatively).GLMM analysis was performed using R lme4 package (50).

Effect of dominance
To determine the dominance hierarchy, we followed (51), using the data collected in experiment 2 (mean number of supplantation: 216, s.d: 172, min = 9, max= 742).We found no evidence of a relationship between the difference in Elo score between the actor and the receiver and the proportion of prosocial choice made by the actor (Fig. S14).There was no reliable relationship for prosocial monkeys (

Effect of affiliative social network
We conducted behavioral observations throughout Experiment 2 to build the affiliative network of the group following Claidière, Gullstrand, Latouche and Fagot (52).We used a five-minute focal sampling method to collect 76.5 hours of behavioral observation including 2560 affiliative behaviors.
We found no strong correlation between the affiliative association index (51)

Figure S2 :
Figure S2: Prosocial and selfish trials.A: The actor makes the prosocial choice.The receiver is

Figure S3 :
Figure S3: Proportion of prosocial choices in each block of 50 trials during the test condition.

Figure S4 :
Figure S4: Proportion of prosocial choices in each block of 50 trials during the control condition.

Figure S5 :
Figure S5: Results of the test phase of experiment 2 for prosocial monkeys.Proportion of prosocial made their choice regardless of their partner's previous behavior (Fig S6, Tab S2).

Figure S6 :
Figure S6: Individual results for the eight prosocial individuals during experiment 1. Proportion of chosen the selfish response (Fig S7, Tab S3).

Figure S7 :
Figure S7: Individual results for the eight prosocial individuals during experiment 2. Proportion of

Figure S8 :
Figure S8: Proportion of trials in which partners change when the prosocial or non-prosocial actor

Figure S9 :
Figure S9: Correlation between the proportion of prosocial choice between pairs of individuals and

Fig S10 :
Fig S10: Interrupted trials.When there was no response within 8 secs during the fixation cross or

Figure S11 :
Figure S11: Proportion of interrupted trial during the choice screen with a prosocial or non-prosocial

Figure S12 :
Figure S12: Proportion of trials with a change in partners after a completed or interrupted trial.Grey

Figure S13 :
Figure S13: Proportion of trials with the same partners that are interrupted after a prosocial or

Figure S15 :
Figure S15: Trials in the costly pct-experiment.A: If the actor chooses the selfish stimulus, a green

Figure S18 :
Figure S18: Correlation between the proportion of prosocial choice between pairs of individuals and

Table S1 :
Characteristics of individuals participating in the study.In bold, prosocial individuals.
Social Automated Learning Devices for Monkeys (S-ALDM)

Table S2 :
Analysis of reciprocity for experiment 1. GLMM results of the binomial model including as

Table S8 :
Results of the preliminary experiment.For each individual and each delay, the number

Table S9 :
Analysis of prosocial choice depending on delay.GLMM results of the binomial model including as dependent variable the binary choice of stimuli (Non-prosocial = 0, Prosocial = 1), depending on the experimental condition (Test 1000ms delay, Test 3000ms and Ghost 3000ms delay as baseline).